Introduction to Medical Statistics 2024
Exercises Class III
Statistical Analysis. Main Concepts and Principles
Exercises Class III
Statistical Analysis. Main Concepts and Principles
I. Calculation of normal probabilities
In R, function pnorm(X, mean, sd) is a function to quantify the probability P(x<X) from a Normal distribution with mean mean and standard deviation sd.
Assume that the probability distribution of a laboratory marker in a population has a normal distribution with mean 75 and standard deviation 15. Use the 68-95-99.7-rule to (approximately) calculate the following probabilities by hand. Use the pnorm function in R to confirm your approximations.
- P(60<x<90)
- P(x<60)
- P(x>90)
- P(x<60 or x>105)
II. Distribution of the sample mean
Assume that the probability distribution of a laboratory marker in a population has a normal distribution with mean 75 and standard deviation 15 and that we measure this marker in 144 individuals. We compute the mean of those 144 values.
- What is the distribution of this sample mean? Give the mean as well as the standard deviation.
- Assuming that the population true mean is \(\mu\) = 75. We no do collect a sample of size 144. The mean of this sample is M. What is the probability that this sample mean M differs by more than 2 units from the population mean \(\mu\) = 75?
- We are running into a problem. We failed to collect the planned sample size of N = 144. In fact, we only managed to collect a very small handful of samples. That mean we could not approximate a Normal distribution anymore. We have to use a t distribution. Given the function to calculate a cumulative probabily of a t distribution is
Rispstudent_t(X, df, mu, sigma)(instead ofpnorm(X, mean, sd)as we have seen before). Recalculate the probability in question 2 the scenarios below. Compare that with what you observe from the Normal observation, what can you conclude?
- N = 4
- N = 9
- N = 25
- N = 144 (but don’t use the Normal approximation.)
III. Confidence interval for a proportion
We continue with the exercise from this morning, in which we considered a new chemotherapy. In a trial of 50 patients, 16 had a tumor response. This morning we tested the null hypothesis that the new therapy does not increase the tumor response probability to a value higher than the 20% of the currently available chemotherapies. Now we focus on the 95% confidence interval.
Calculate a two-sided 95% confidence interval for the true tumor response probability. In \(\textsf{R}\), this is done via function prop.test. Use the fact that the confidence interval is equal to the estimate \(\pm 2 \times\) SE, with SE the standard error of the sample proportion.
IV. Inference on the change in white cell count
The dataset bmData.csv contains selected variables from 300 patients with confirmed bacterial meningitis. They were randomized to either adjunctive dexamethasone therapy or placebo. In this exercise we study the change in CSF total white cell count between baseline and follow-up in the dexamethasone arm.
- Have a look at the description of the dataset and import the dataset bmData.csv into R with variable name bmData and use
tbl_summaryto summarise the dataset (except patId) by treatment group (variablegroup).
- Create a dataset that restricts observations to the dexamethasone group (e.g. using the subset function; look at the examples in the help file for filter if you need to learn how to use it).
- Make a numerical summary of CSF total white cell count at each of the time points: wc.csf at baseline and wc.csf.fup at follow-up. What do you find regarding the distributions and value ranges?
Now make a histogram to describe the variation visually. Does CSF total white cell count at each of the time points approximately have a Normal distribution?
If not: consider a suitable transformation of the variable and make the histograms again.
- We shall create a variable that contains the change in CSF total white cell count between baseline and follow-up. Do the same for the change in the transformed values. Have a look at the distribution of the difference (
diff.wcanddiff.log.wc), both for the original variable as well as for the transformed variable.
- Compute the 95% confidence interval for the change in transformed value of CSF total white cell count, based on the quantiles of the Student’s t distribution. In $, the
t.testfunction can be used. Is there a change in CSF total white cell count between baseline and follow-up? Compare the results with those if the change in the original values is studied.
- You might also want to plot theme. This is done via function
plotttestin packagenhstplot
- Do the calculations to obtain the 95% confidence interval yourself and compare results with those from the
t.testfunction. Use the function introduced in Exercise ii. There are missing values, which we need to remove before we can apply the functions